AITopics | Saratoga

Collaborating Authors

Saratoga

A survey of using EHR as real-world evidence for discovering and validating new drug indications

Talukdar, Nabasmita, Zhang, Xiaodan, Paithankar, Shreya, Wang, Hui, Chen, Bin

arXiv.org Artificial IntelligenceNov-21-2025

Electronic Health Records (EHRs) have been increasingly used as real-world evidence (RWE) to support the discovery and validation of new drug indications. This paper surveys current approaches to EHR-based drug repurposing, covering data sources, processing methodologies, and representation techniques. It discusses study designs and statistical frameworks for evaluating drug efficacy. Key challenges in validation are discussed, with emphasis on the role of large language models (LLMs) and target trial emulation. By synthesizing recent developments and methodological advances, this work provides a foundational resource for researchers aiming to translate real-world data into actionable drug-repurposing evidence.

data mining, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2505.24767

Country:

North America > United States > Georgia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan > Kent County > Grand Rapids (0.04)
(7 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(12 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(6 more...)

Add feedback

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations

Kharchenko, Julia, Roosta, Tanya, Chadha, Aman, Shah, Chirag

arXiv.org Artificial IntelligenceAug-8-2025

This paper introduces a comprehensive benchmark for evaluating how Large Language Models (LLMs) respond to linguistic shibboleths: subtle linguistic markers that can inadvertently reveal demographic attributes such as gender, social class, or regional background. Through carefully constructed interview simulations using 100 validated question-response pairs, we demonstrate how LLMs systematically penalize certain linguistic patterns, particularly hedging language, despite equivalent content quality. Our benchmark generates controlled linguistic variations that isolate specific phenomena while maintaining semantic equivalence, which enables the precise measurement of demographic bias in automated evaluation systems. We validate our approach along multiple linguistic dimensions, showing that hedged responses receive 25.6% lower ratings on average, and demonstrate the benchmark's effectiveness in identifying model-specific biases. This work establishes a foundational framework for detecting and measuring linguistic discrimination in AI systems, with broad applications to fairness in automated decision-making contexts.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.04939

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
(23 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)
Personal > Interview (0.67)
Research Report > Experimental Study (0.67)

Industry:

Law (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

'The vehicle suddenly accelerated with our baby in it': the terrifying truth about why Tesla's cars keep crashing

The GuardianJul-5-2025, 05:00:01 GMT

It was a Monday afternoon in June 2023 when Rita Meier, 45, joined us for a video call. Meier told us about the last time she said goodbye to her husband, Stefan, five years earlier. He had been leaving their home near Lake Constance, Germany, heading for a trade fair in Milan. Meier recalled how he hesitated between taking his Tesla Model S or her BMW. He had never driven the Tesla that far before. He checked the route for charging stations along the way and ultimately decided to try it. Rita had a bad feeling. She stayed home with their three children, the youngest less than a year old. At 3.18pm on 10 May 2018, Stefan Meier lost control of his Model S on the A2 highway near the Monte Ceneri tunnel. "The collision with the guardrail launches the vehicle into the air, where it flips several times before landing," investigators would write later. The car came to rest more than 70 metres away, on the opposite side of the road, leaving a trail of wreckage. Several passersby tried to open the doors and rescue the driver, but they couldn't unlock the car. When they heard explosions and saw flames through the windows, they retreated. Even the firefighters, who arrived 20 minutes later, could do nothing but watch the Tesla burn.

musk, tesla, vehicle, (16 more...)

The Guardian

Country:

North America > United States > Colorado (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > California > Santa Clara County > Saratoga (0.04)
(5 more...)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Automobiles & Trucks > Manufacturer (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)

Add feedback

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Li, Zekun, Huang, Shinda, Wang, Jiangtian, Zhang, Nathan, Antoniades, Antonis, Hua, Wenyue, Zhu, Kaijie, Zeng, Sirui, Wang, William Yang, Yan, Xifeng

arXiv.org Artificial IntelligenceMar-11-2025

As language agents progressively automate critical tasks across domains, their ability to operate within operational constraints and safety protocols becomes essential. While extensive research has demonstrated these agents' effectiveness in downstream task completion, their reliability in following operational procedures and constraints remains largely unexplored. To this end, we present AgentOrca, a dual-system framework for evaluating language agents' compliance with operational constraints and routines. Our framework encodes action constraints and routines through both natural language prompts for agents and corresponding executable code serving as ground truth for automated verification. Through an automated pipeline of test case generation and evaluation across five real-world domains, we quantitatively assess current language agents' adherence to operational constraints. Our findings reveal notable performance gaps among state-of-the-art models, with large reasoning models like o1 demonstrating superior compliance while others show significantly lower performance, particularly when encountering complex constraints or user persuasion attempts.

correct credential, database, username, (13 more...)

arXiv.org Artificial Intelligence

2503.08669

Country:

North America > United States > Ohio (0.04)
North America > United States > California > Santa Clara County > Saratoga (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Automobiles & Trucks > Manufacturer (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data

Wu, Xue, Tsioutsiouliklis, Kostas

arXiv.org Artificial IntelligenceDec-13-2024

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, they often struggle with complex reasoning tasks and are prone to hallucination. Recent research has shown promising results in leveraging knowledge graphs (KGs) to enhance LLM performance. KGs provide a structured representation of entities and their relationships, offering a rich source of information that can enhance the reasoning capabilities of LLMs. For this work, we have developed different techniques that tightly integrate KG structures and semantics into LLM representations. Our results show that we are able to significantly improve the performance of LLMs in complex reasoning scenarios, and ground the reasoning process with KGs. We are the first to represent KGs with programming language and fine-tune pretrained LLMs with KGs. This integration facilitates more accurate and interpretable reasoning processes, paving the way for more advanced reasoning capabilities of LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.10654

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > California > Santa Clara County > Saratoga (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

Wei, Timothy, Miin, Annabelle, Miin, Anastasia

arXiv.org Artificial IntelligenceOct-24-2024

Those methods usually rely on data curation reflecting on the deliberate reasoning path in specific application Large Language Models (LLMs) have recently demonstrated impressive areas. When it comes to complex application constraints, capabilities across various real-world applications. However, high-quality solutions often demand a large volume of data to cover due to the current text-in-text-out paradigm, it remains challenging enough data cases and the corresponding reasoning logic. This for LLMs to handle dynamic and complex application constraints, process fundamentally differs from the typical human cognition let alone devise general solutions that meet predefined process: faced with unfamiliar problems, people first seek to capture system goals. Current common practices like model finetuning and the overview of the underlying application constraints, which are reflection-based reasoning often address these issues case-by-case, potential rules summarized from observations. Next, these learned limiting their generalizability. To address this issue, we propose rules will be further refined when exception cases arise. The essence a flexible framework that enables LLMs to interact with system of this cognitive process lies in distilling rules and identifying minimal interfaces, summarize constraint concepts, and continually optimize cases for refinement rather than depending on the inefficient performance metrics by collaborating with human experts.

constraint, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.15163

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > Santa Clara County > Saratoga (0.04)
Asia > South Korea > Incheon > Incheon (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions

Kharchenko, Julia, Roosta, Tanya, Chadha, Aman, Shah, Chirag

arXiv.org Artificial IntelligenceJun-20-2024

Large Language Models (LLMs) attempt to imitate human behavior by responding to humans in a way that pleases them, including by adhering to their values. However, humans come from diverse cultures with different values. It is critical to understand whether LLMs showcase different values to the user based on the stereotypical values of a user's known country. We prompt different LLMs with a series of advice requests based on 5 Hofstede Cultural Dimensions -- a quantifiable way of representing the values of a country. Throughout each prompt, we incorporate personas representing 36 different countries and, separately, languages predominantly tied to each country to analyze the consistency in the LLMs' cultural understanding. Through our analysis of the responses, we found that LLMs can differentiate between one side of a value and another, as well as understand that countries have differing values, but will not always uphold the values when giving advice, and fail to understand the need to answer differently based on different cultural values. Rooted in these findings, we present recommendations for training value-aligned and culturally sensitive LLMs. More importantly, the methodology and the framework developed here can help further understand and mitigate culture and language alignment issues with LLMs.

cultural dimension, llm, persona, (13 more...)

arXiv.org Artificial Intelligence

2406.14805

Country:

Asia > Japan (0.05)
Europe > Russia (0.04)
Asia > Russia (0.04)
(42 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop

Amirizaniani, Maryam, Yao, Jihan, Lavergne, Adrian, Okada, Elizabeth Snell, Chadha, Aman, Roosta, Tanya, Shah, Chirag

arXiv.org Artificial IntelligenceFeb-16-2024

As LLMs become more pervasive across various users and scenarios, identifying potential issues when using these models becomes essential. Examples include bias, inconsistencies, and hallucination. Although auditing the LLM for these problems is desirable, it is far from being easy or solved. An effective method is to probe the LLM using different versions of the same question. This could expose inconsistencies in its knowledge or operation, indicating potential for bias or hallucination. However, to operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically. In this paper we propose an automatic and scalable solution, where one uses a different LLM along with human-in-the-loop. This approach offers verifiability and transparency, while avoiding circular reliance on the same LLMs, and increasing scientific rigor and generalizability. Specifically, we present a novel methodology with two phases of verification using humans: standardized evaluation criteria to verify responses, and a structured prompt template to generate desired probes. Experiments on a set of questions from TruthfulQA dataset show that we can generate a reliable set of probes from one LLM that can be used to audit inconsistencies in a different LLM. The criteria for generating and applying auditing probes is generalizable to various LLMs regardless of the underlying structure or training mechanism.

llm, probe, prompt template, (15 more...)

arXiv.org Artificial Intelligence

2402.09346

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)

Industry:

Health & Medicine (0.68)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach

Amirizaniani, Maryam, Roosta, Tanya, Chadha, Aman, Shah, Chirag

arXiv.org Artificial IntelligenceFeb-14-2024

As Large Language Models (LLMs) gain wider adoption in various contexts, it becomes crucial to ensure they are reasonably safe, consistent, and reliable for an application at hand. This may require probing or auditing them. Probing LLMs with varied iterations of a single question could reveal potential inconsistencies in their knowledge or functionality. However, a tool for performing such audits with simple workflow and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to evaluate the performance of various LLMs in a methodical way. AuditLLM's core functionality lies in its ability to test a given LLM by auditing it using multiple probes generated from a single question, thereby identifying any inconsistencies in the model's understanding or operation. A reasonably robust, reliable, and consistent LLM should output semantically similar responses for a question asked differently or by different people. Based on this assumption, AuditLLM produces easily interpretable results regarding the LLM's consistencies from a single question that the user enters. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.

computational linguistic, language model, llm, (13 more...)

arXiv.org Artificial Intelligence

2402.09334

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > New York > New York County > New York City (0.05)
(11 more...)

Genre: Workflow (0.69)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

System 2 Attention (is something you might need too)

Weston, Jason, Sukhbaatar, Sainbayar

arXiv.org Artificial IntelligenceNov-20-2023

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information: QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.

arxiv preprint arxiv, llama-2-70b-chat, system 2, (14 more...)

arXiv.org Artificial Intelligence

2311.11829

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.05)
North America > United States > California > Santa Clara County > Saratoga (0.04)

Genre:

Personal (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback